近年来,面部语义指导(包括面部地标,面部热图和面部解析图)和面部生成对抗网络(GAN)近年来已广泛用于盲面修复(BFR)。尽管现有的BFR方法在普通案例中取得了良好的性能,但这些解决方案在面对严重降解和姿势变化的图像时具有有限的弹性(例如,在现实世界情景中看起来右,左看,笑等)。在这项工作中,我们提出了一个精心设计的盲人面部修复网络,具有生成性面部先验。所提出的网络主要由非对称编解码器和stylegan2先验网络组成。在非对称编解码器中,我们采用混合的多路残留块(MMRB)来逐渐提取输入图像的弱纹理特征,从而可以更好地保留原始面部特征并避免过多的幻想。 MMRB也可以在其他网络中插入插件。此外,多亏了StyleGAN2模型的富裕和多样化的面部先验,我们采用了微调的方法来灵活地恢复自然和现实的面部细节。此外,一种新颖的自我监督训练策略是专门设计用于面部修复任务的,以使分配更接近目标并保持训练稳定性。关于合成和现实世界数据集的广泛实验表明,我们的模型在面部恢复和面部超分辨率任务方面取得了卓越的表现。
translated by 谷歌翻译
视觉地点识别(VPR)是一个具有挑战性的任务,具有巨大的计算成本与高识别性能之间的不平衡。由于轻质卷积神经网络(CNNS)和局部聚合描述符(VLAD)层向量的火车能力的实用特征提取能力,我们提出了一种由前部组成的轻量级弱监管的端到端神经网络-anded的感知模型称为ghostcnn和学习的VLAD层作为后端。 Ghostcnn基于幽灵模块,这些模块是基于重量的CNN架构。它们可以使用线性操作而不是传统的卷积过程生成冗余特征映射,从而在计算资源和识别准确性之间进行良好的权衡。为了进一步增强我们提出的轻量级模型,我们将扩张的卷曲添加到Ghost模块中,以获取包含更多空间语义信息的功能,提高准确性。最后,在常用的公共基准和我们的私人数据集上进行的丰富实验验证了所提出的神经网络,分别将VGG16-NetVlad的拖鞋和参数减少了99.04%和80.16%。此外,两种模型都达到了类似的准确性。
translated by 谷歌翻译
Spatio-temporal modeling as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the underlying heterogeneity and non-stationarity implied in the graph streams, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a large-scale spatio-temporal dataset that contains a variaty of non-stationary phenomena. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle locations and time slots with different patterns and be robustly adaptive to different anomalous situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
Spatial-temporal (ST) graph modeling, such as traffic speed forecasting and taxi demand prediction, is an important task in deep learning area. However, for the nodes in graph, their ST patterns can vary greatly in difficulties for modeling, owning to the heterogeneous nature of ST data. We argue that unveiling the nodes to the model in a meaningful order, from easy to complex, can provide performance improvements over traditional training procedure. The idea has its root in Curriculum Learning which suggests in the early stage of training models can be sensitive to noise and difficult samples. In this paper, we propose ST-Curriculum Dropout, a novel and easy-to-implement strategy for spatial-temporal graph modeling. Specifically, we evaluate the learning difficulty of each node in high-level feature space and drop those difficult ones out to ensure the model only needs to handle fundamental ST relations at the beginning, before gradually moving to hard ones. Our strategy can be applied to any canonical deep learning architecture without extra trainable parameters, and extensive experiments on a wide range of datasets are conducted to illustrate that, by controlling the difficulty level of ST relations as the training progresses, the model is able to capture better representation of the data and thus yields better generalization.
translated by 谷歌翻译
Traffic forecasting as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the spatio-temporal heterogeneity and non-stationarity implied in the traffic stream, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a new large-scale traffic speed dataset in which traffic incident information is contained. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the road links and time slots with different patterns and be robustly adaptive to any anomalous traffic situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
本文旨在解释刚被二进制标签监督时,深泡检测模型如何学习图像的人工制品特征。为此,从图像匹配的角度提出了三个假设,如下所示。 1. DeepFake检测模型指出了基于既不是与源相关又不相关的视觉概念的真实/假图像,也就是说,考虑到与伪影这样的视觉概念。 2.除了对二进制标签的监督外,DeepFake检测模型还通过训练集中的FST匹配(即匹配的伪造,源,目标图像)隐含地学习与伪影相关的视觉概念。 3.通过原始训练集中的FST匹配,隐式学习的人工构图概念容易受到视频压缩的影响。在实验中,在各种DNN中验证了上述假设。此外,基于这种理解,我们提出了FST匹配的DeepFake检测模型,以提高压缩视频中伪造检测的性能。实验结果表明,我们的方法实现了出色的性能,尤其是在高度压缩的(例如C40)视频上。
translated by 谷歌翻译
作为一个决定性的部分,在移动式服务(MAA)的成功中,人群运动的时空预测建模是一个具有挑战性的任务,特别是考虑到社会事件驱动偏离正常性的移动性行为的情景。虽然已经进行了深入学习的高级时空态度,但大多数情况下都是巨大进展,如果不是所有现有方法都不知道多种传输模式之间的动态相互作用,也不是对潜在的社会事件带来的前所未有的波动性。在本文中,我们的动力是从两个视角改善规范时空网络(ST-Net):(1)设计异质移动信息网络(Hmin),明确地在多模式移动性中明确代表差异; (2)提出内存增强的动态滤波器发生器(MDFG),以产生各种场景的动态方式生成序列特定参数。增强的事件感知的时空网络,即East-Net,在几个现实世界数据集中评估了各种各样的社会事件的繁多和覆盖范围。与最先进的基线相比,定量和定性实验结果验证了我们方法的优势。代码和数据在https://github.com/dunderdoc-wang/east-net上发布。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译